impr(polyglot): Add new structure (wordsMap) to PolyglotWordset to get words from any language by uniform distribution (@IliyaZinoviev)#7440
Conversation
2d5e9ad to
ed49d47
Compare
…t words from any language by uniform distribution (@IliyaZinoviev)
ed49d47 to
52be7ba
Compare
There was a problem hiding this comment.
Pull request overview
Adjusts polyglot word generation so languages are sampled (approximately) uniformly, rather than being dominated by languages with larger dictionaries.
Changes:
- Refactors
PolyglotWordsetto store per-languageWordsets (wordsMap) and pick a language uniformly per generated word. - Updates word generation logic to use
PolyglotWordset.currentLanguageinstead of per-word language lookups. - Small control-flow cleanup in word list retrieval / funbox wordset assignment.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| frontend/src/ts/test/words-generator.ts | Switches polyglot language detection to currentLanguage; minor cleanup around word list returns and funbox wordset assignment. |
| frontend/src/ts/test/funbox/funbox-functions.ts | Reworks PolyglotWordset internals to use wordsMap and uniform language selection. |
| super([]); | ||
| this.languageProperties = languageProperties; | ||
| this.langs = Array.from(languageProperties.keys()); | ||
| this.wordsMap = wordsMap; | ||
| this.resetIndexes(); | ||
| this.length = Array.from(this.wordsMap.values()).reduce( | ||
| (sum, ws) => sum + ws.words.length, | ||
| 0, | ||
| ); |
There was a problem hiding this comment.
PolyglotWordset now calls super([]) and never populates this.words. Code in words-generator relies on currentWordset.words (eg repeat-quote path + hasTab/hasNewline checks), so polyglot wordsets will report an empty list. Consider keeping words as an aggregate of all per-language words (or overriding/adding an accessor and updating consumers to use it).
| super([]); | |
| this.languageProperties = languageProperties; | |
| this.langs = Array.from(languageProperties.keys()); | |
| this.wordsMap = wordsMap; | |
| this.resetIndexes(); | |
| this.length = Array.from(this.wordsMap.values()).reduce( | |
| (sum, ws) => sum + ws.words.length, | |
| 0, | |
| ); | |
| const aggregatedWords = Array.from(wordsMap.values()).reduce<string[]>( | |
| (all, ws) => all.concat(ws.words), | |
| [], | |
| ); | |
| super(aggregatedWords); | |
| this.languageProperties = languageProperties; | |
| this.langs = Array.from(languageProperties.keys()); | |
| this.wordsMap = wordsMap; | |
| this.resetIndexes(); | |
| this.length = aggregatedWords.length; |
| } | ||
|
|
||
| override resetIndexes(): void { | ||
| this.wordsMap.forEach((ws, _) => { |
There was a problem hiding this comment.
resetIndexes() uses this.wordsMap.forEach((ws, _) => ...) but the key param is unused. Drop the second arg to avoid unused-var lint noise.
| this.wordsMap.forEach((ws, _) => { | |
| this.wordsMap.forEach(ws => { |
Add uniform distribution to polyglot
Motivation
During polyglot using, I mentioned that words from language with huge dictionaries are met always, but words from small one - never (at least from a user experience perspective, it looks like this). My PR solves this issue. And now each language is consumed per test as much as any other.
Description
Changed:
Checks
packages/schemas/src/languages.tsfrontend/src/ts/constants/languages.tsfrontend/static/languagespackages/schemas/src/themes.tsfrontend/src/ts/constants/themes.tsfrontend/static/themespackages/schemas/src/layouts.tsfrontend/static/layoutsfrontend/static/webfontspackages/schemas/src/fonts.tsfrontend/src/ts/constants/fonts.tsCloses #